Ethical sourcing of data to serve automated mortgage processes

Data is ubiquitous today and can help businesses streamline processes and accelerate their growth. But how can companies source consumer information in a compliant and ethical manner, and why is it in their best interest? Kenon Chen, executive vice president, strategy and growth, at Clear Capital, sits down to discuss what ethical sourcing entails and its relevance in the home lending market today.


Transcription:
Transcripts are generated using a combination of speech recognition software and human transcribers, and may contain errors. Please check the corresponding audio for the authoritative record.

Spencer Lee (00:09):
Good afternoon, and welcome to the Innovation Stage at this Arizent Leaders Forum coming to you live, or live on tape if that's a thing, from Digital Mortgage 2023, our conference in Las Vegas. The topic of conversation today is ethical sourcing of data to serve automated mortgage processes. And my name is Spencer Lee, a reporter at National Mortgage News. I have with me Kenon Chen, an executive at Clear Capital, who's really focused on using data, utilizing data and technology to streamline the mortgage process. He's executive director of growth and strategy at Clear Capital, the real estate valuations firm. We'll dive right in. First of all, thank you ,Kenon for joining us. To start the conversation. I have a real sort of broad question. I think when people see the term "ethical data sourcing, " — and I wondered this too — what exactly is that? How do you define that? How does Clear Capital define that?

Kenon Chen (01:08):
I think there's a couple of different definitions. And first of all, thanks for — it's always good to talk to you, Spencer. Thanks for having me here. And I think the first way we think about it is just when you're dealing with direct-source data, is the way that that data is going to be sourced and utilized — is it consistent, if you will, with the actual contract and intended use case of that data to begin with?

Data can be obtained in a lot of different ways, and what we've found is that especially over the past couple decades, that data that is sometimes hard to get and hard to obtain on a national level that people needed to use a lot of different methods to try to aggregate that data. And so it really comes into play where we want to make sure that the way that data is obtained and then the actual way that that data is utilized does not create issues with the people that are hopefully benefiting from that and benefiting from those products because the data wasn't actually obtained ethically.

Spencer Lee (02:26):
So no surprises?

Kenon Chen (02:30):
No surprises. We're talking about more than just contractual risk or perhaps risks of that data not being available, but we're talking about reputational risks too. We're talking about trust, and a big theme for here is around AI and around where does the data actually come from to train models. And so it's really expanding this idea of ethical data sourcing. It's not just about now the agreement for how that data is going to be utilized, but also the traceability, right? How is that actually data originally generated and how do we know that the end result will be accurate.

Spencer Lee (03:19):
Now, how does that come into play in sort of the specialties that Clear Capital, your focus? And I'm thinking evaluations, appraisals. There might be others, and forgive me if I'm forgetting, but how does that play a role in what you do, what Clear Capital does?

Kenon Chen (03:35):
Certainly, I mean, what we found was that it was difficult to find aggregators, providers of certain types of data that was transparent as to the sourcing of it and transparent in terms of the intended uses, available uses. And so we've actually taken the approach of really building our own direct relationships where we didn't have that confidence. We've actually made the investment to build those direct relationships ourselves. So we had the ability to audit that — know we are compliant — and then give our clients confidence in their usage of our products, so that there wouldn't be other types of risk, reputational risk and the interruptions of service.


Spencer Lee (04:34):
Now, can that apply in other aspects of the mortgage ecosystem outside of what you do? Do you think it can also come into play there and how?

Kenon Chen (04:45):
I think so. There's some shortcuts that can be taken. It's incredible the way that machine learning and AI have given us, not us, but players and individuals, the ability to do things like web scraping at scale and grab data from just what's on the internet. But how do you verify the sources that those were grabbed from were not only transparent as to what those uses should be, but also verifying the authenticity and accuracy of that data.

So, it's an issue within our space. And I think one of those things that we started tackling during the FHFA tech sprint this year, one of the main themes that came out of that activity there was data trust and how do you not only create a standard for data trust, but how do you remove the need to reverify over and over again? And I think it starts with this ethical data sourcing concept.

Spencer Lee (06:09):
Well, that's a great pivot to what was going to be my next question about some of the benefits of ethical sourcing. Do you have any other examples? We earlier talked about how it leads to trust among the borrower, for the borrower. What other benefits could there be through this approach?

Kenon Chen (06:29):
Yeah, I think that one provides a framework where data creators, data providers feel, again, more comfortable to make valuable data available to the lending industry because if it's going to be used within transparent usage rights and guidelines. I think that actually creates more access to quality data than if it's going to be abused.

Spencer Lee (07:06):
So transparency seems like maybe the key talking point or the clearest benefit . Now, and I use this term very loosely, but what is — air quotes — unethical data sourcing?

Kenon Chen (07:25):
I think there's two things there. One is, was there actual rights to that data to begin with? And then, are those rights being honored all the way through as that data is used for different purposes? That's one.

Two, is that data verified, if you will, as to whether that should be considered in the input of, especially when we talk about AI, should that data be included in the training of AI models or not? That's one of the big things that we're grappling with, again, from an ethical reason. If you know for instance, that there is a potential bias in some of the data that's being utilized, should you include that or not? I mean that's where I'm kind of expanding the definition a bit of ethical data sourcing is we have to be responsible in ensuring that data is clean and actually is legitimate, if you will, in some of the models that we use for training.

Spencer Lee (08:39):
Well, I'm glad you brought that up — the AI aspect up, how difficult does that make your job to ensuring the data is accurate? It might benefit it as well in some ways.

Kenon Chen (08:51):
Sure. I mean we talk a lot about ChatGP and OpenAI platforms and where it's very difficult to understand what data was maybe considered in the training of those models. We have a higher standard within our industry to ensure that we don't have biases and inaccuracies introduced into our data sets. Data quality is absolutely paramount when it comes to building models for loan decisioning. So for us, we take a lot of care and the curation of the data and the quality of the data.

And then, even thinking about things that maybe, as I think about our automated evaluation model, we also think about things that have been historically maybe utilized for understanding things like geographic boundaries and whether or not we want to use that as our reference or not. So for instance, ZIP codes are not necessarily reflective of how a real estate market behaves or what might be the appropriate way to draw a market for evaluation. So those are all things that really need to be considered if you want to have a result That's fair, accurate, and confident.

Spencer Lee (10:30):
It also sounds a little bit like the traceability of the data as well of where it comes from.

Kenon Chen (10:37):
Where it comes from and historically how that was decided?

Spencer Lee (10:42):
Yeah. Now we've talked a little bit about the benefits, but potential consequences of maybe not of using less-than-quality data or maybe not ethically sourced data. Is there other consequences?

Kenon Chen (10:58):
Yeah, I think if you don't have the transparency in the direct source, and for solution providers, an understanding of where it came from, but also for end clients and then ultimately consumers. If there isn't these ethical approaches involved, then it becomes much harder to understand why a particular solution is accurate or not, or able to be trusted. And then, that can create public trust issues and confidence issues, which then perhaps would end up limiting our access to good data in the future. So we have to think more about the longterm, not just short-term access.

Spencer Lee (11:50):
Now, I'm curious what about does publicly sourced data, how does that play into this? And there's lots of public data out there, the leading aggregators — that's how they give us their results. How does that play into this topic?

Kenon Chen (12:04):
Yeah, there's all kinds of interesting things going on. Of course we have public records data, something we rely on quite a bit in the mortgage industry and it's a pretty well-known source. But what about things like at the local level, zoning and permitting? And it's interesting to see companies now really try to grapple with public data that should be available and accessible, but at the local level, local government level, they often don't have the infrastructure to serve that data up in a consistent way.

So some of these same techniques that I said like web scraping that were, perhaps used to obtain data unethically in the past could actually be the very same tactics and tools that let us actually get access to public data in a way that would be very helpful to our industry. So how these techniques are applied and to what dataset they're applied to and how transparent that is becomes really important as well.

Spencer Lee (13:15):
That's really interesting. It's a very complicated topic, really explained a lot, and I'm sure more will develop. We will see more developments, especially as AI plays a bigger role in how we access data. But I want to thank you for your time today, Kenon. Thanks very much and thanks to all of you for joining this Arizent Leaders Forum.

Kenon Chen:
Thank you.